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Abstract 

We study an entropy measure for quantum systems that generalizes the von Neumann 
entropy as well as its classical counterpart, the Gibbs or Shannon entropy. After establishing 
a few basic properties of this generalized entropy, we show that it is closely related to 
smooth entropies, a family of entropy measures that is used to characterize a wide range 
of operational quantities. 

1 Introduction 

Entropy, originally introduced in thermodynamics, is nowadays recognized as a rather uni- 
versal concept with a variety of uses, ranging from physics and chemistry to information 
theory and the theory of computation. Besides the role it plays for foundational questions, it 
is also relevant for applications. For example, entropy is used to study the efficiency of steam 
engines, but it also occurs in formulae for the data transmission capacity of optical fibres. 

While entropy can be defined in various ways, a very common form employed for the 
study of classical systems is the Gibbs entropy or, in the context of information theory, the 
Shannon entropy [1]. It is defined for any probability distribution P as 

H(P) = -Y i P{x)logP(x) 

X 

(up to an unimportant proportionality factor). Fhis definition has been generalized to the 
von Neumann entropy [2], which is defined for density operators, 

h (p) = - Tr (p!ogp) • 

While these entropy measures have a wide range of applications, it has recently become ap- 
parent that they are not suitable for correctly characterizing operationally relevant quantities 
in general scenarios (as explained below). This has led to the development of extensions [3], 
among them the information spectrum approach [4, 5, 6] and smooth entropies [7, 8] (where the 
former can be obtained as an asymptotic limit of the latter [9]). 

The aim of this work is to study an alternative measure of entropy that generalizes von 
Neumann entropy. The generalized entropy is closely related to smooth entropies, which, in 
turn, are connected to a variety of operational quantities. 
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1.1 Axiomatic and operational approach to entropy 

The variety of areas and applications where entropies are used is impressive, and one may 
wonder what it is that makes entropy such a versatile concept. 

One may attempt to answer the question from an axiomatic viewpoint. Here, the idea is to 
consider (small) sets of axioms that characterize the nature of entropy. There is a vast amount 
of literature devoted to the specification of such axioms and their study [1, 10, 11, 12, 13, 14, 
15, 16]. While the choice of a set of axioms is ultimately a matter of taste, we sketch in the 
following some of the most popular axioms. We do this for the case of entropies defined on 
quantum systems, i.e., we consider functions H from the set of density operators (denoted 
by p) to the real numbers. 

• Positivity: H(p) ^ 0. 

• Invariance under isometries: H(UpW) = H(p). 

• Continuity: H is a continuous function of p. 

• Additivity: H{p A (x) p B ) = H{pa) + H{p B ). 

• Subadditivity: H(pab) < H{pa) + H(pb)- 1 

The (special) case of classical entropies is obtained by replacing the density operators by 
probability distributions. Note that the second axiom then reduces to the requirement that 
the entropy is invariant under permutations. 

It is easy to verify that the von Neumann entropy satisfies the above axioms. Further- 
more, it can be shown that (up to a constant factor, which may be fixed by an additional 
normalization axiom), the von Neumann entropy is essentially the only function satisfying 
the above postulates [12]. This result — as well as similar results based on slightly different 
sets of axioms — nicely expose the universal nature of entropy. Note, in particular, that the 
above axioms do not refer specifically to thermodynamic or information-theoretic properties 
of a system. 

An alternative to this axiomatic approach is to relate entropy to operational quantities. In 
thermodynamics, examples for such operational quantities include measures for heat flow 
or the amount of work that is transformed into heat during a given process. In information 
theory, operational quantities are, for instance, the minimum size to which the information 
generated by a source can be compressed, or the amount of uniform randomness that can be 
extracted from a non-uniform source. 

Given the very different nature of these operational quantities, it is not obvious that this 
approach can lead to a reasonable notion of entropy. One would rather expect an entire 
family of entropy measures — possibly as large as the number of different operational quan- 
tities one considers. However, there exist remarkable connections, even relating thermody- 
namic and information-theoretic quantities. For example, it follows from Landauer's princi- 
ple [17, 18] that the amount of work that can be extracted from a system is directly related to 
the size to which the information contained in it can be compressed [19, 20, 21]. 

1 Here pab denotes a density operator on a bipartite system and pa and pB are obtained by partial traces over 
the second and first subsystem, respectively. 
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Recent work has shown that a large number of operational quantities can be character- 
ized with one single class of entropy measures. Smooth entropies (denoted by H^ lin and ff^), 
which were developed mostly within quantum information theory are an example of such a 
class. For instance, H^ nin quantifies the number of uniformly random (classical) bits that can 
be deterministically extracted from a weak source of randomness [8, 22] and quantifies 
the number of bits needed to encode a given (classical) value [23]. More generally H^ in can 
be used to characterize decoupling [24], a quantum version of randomness extraction [25], and 
state merging [26, 27], which can be seen as the fully quantum analogue of coding [28]. Also, a 
combination of H^ nin and H^ nax gives an expression for the classical capacity of a classical [29] 
or a quantum [30] channel, as well as its "reverse" capacity [31]. Additional applications can 
be found particularly in quantum cryptography (see, e.g., [8, 32, 33]). Smooth entropies also 
have operational interpretations within thermodynamics. For example, they can be used in 
a single-shot version of Landauer's principle to quantify the amount of work required by an 
operation that moves a given system into a pure state [19, 20, 21]. 

However, smooth entropies are generally different from the von Neumann entropy ex- 
cept in special cases. This implies that many operational quantities, characterized by smooth 
entropies, are not in general accurately described by the von Neumann entropy (e.g. the 
amount of extractable randomness or the encoding length). In particular, it follows that 
some of the axioms considered above must be incompatible with the operational approach. 

This can also be seen directly, for example, for the (classical) task of randomness extrac- 
tion. Let C(X) be the number of uniform bits that can be obtained by applying a function 
to a random variable X distributed according to Px- Then the quantity C automatically has 
the properties one would expect from an uncertainty measure: it equals if X is perfectly 
known, and it increases as X becomes more uncertain. One may therefore interpret C as an 
(operationally defined) entropy measure for classical random variables. 

However, while C is indeed positive, invariant under permutations, and additive, it is 
not subadditive. To see this, consider a random variable R uniformly distributed over the 
set {1, . . . , 2 £ }, for some large I e N. Furthermore, define the random variables X and Y by 

if R 2 £ - x 
otherwise 

if R > 2 £ - x 
otherwise. 

Since Pr[X = 0] = Pr[Y = 0] = \, it is not possible to extract more than 1 bit from either of 
X or Y separately, i.e., C{X) = C{Y) ^ 1. However, since the pair (X, Y) is in one-to-one 
relation to R, we have C(XY) = C(R) = I. Hence, subadditivity, C(XY) ^ C(X) + C(Y) 
can be violated by an arbitrarily large amount. 2 

1.2 Generalized entropy measure 

The above considerations show that an operational approach to entropies necessitates the 
use of entropy measures that are more general than those obtained by the usual axiomatic 

2 However, an inequality of similar form can be recovered — this is known as entropy splitting lemma [34, 35] 
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approaches. The aim of this paper is to investigate such a generalization. The entropy mea- 
sure we consider is motivated by previous work [36, 37, 38, 39]. 

We consider a family of entropies, denoted H e H , parametrized by a real number e from the 
interval [0, 1]. H e H is defined via a relative-entropy type quantity, i.e., a function that depends 
on two density operators, p and a, similarly to the Kullback-Leibler divergence [40, 41]. This 
quantity, denoted D e H , has a simple interpretation in the context of quantum hypothesis 
testing [42]. Consider a measurement for distinguishing whether a system is in state p or 
a. D e H (p\\a) then corresponds to the negative logarithm of the failure probability when the 
system is in state a, under the constraint that the success probability when the system is in 
state p is at least e (see Section 3.1 below). 

Starting from D £ H (p\\a), it is possible to directly define a conditional entropy, H e H (A\B), 
i.e., a measure for the uncertainty of a system A conditioned on a system B (see Section 3.2 
below). We note that, while the conditional von Neumann entropy may be defined analo- 
gously using the Kullback-Leibler divergence, the standard expression for conditional von 
Neumann entropy [43], 

H(A\B) = H{ PAB ) - H( PB ) , (1) 

cannot be generalized directly. However, as shown in Section 5, H e H satisfies a chain rule, i.e., 
an inequality which resembles (1). In addition, we show that H e H has many desirable prop- 
erties that one would expect an entropy measure to have (see Section 3.3), for instance that 
it reduces to the von Neumann entropy in the asymptotic limit (Asymptotic Equipartition 
Property). 

A central part of this contribution is to establish direct relations to the smooth entropy mea- 
sures H^ lin and H^ nax (Section 4). As explained above, it has been shown that these accurately 
characterize a number of operational quantities, such as information compression, random- 
ness extraction, entanglement manipulation, and channel coding. Furthermore, they are also 
relevant in the context of thermodynamics, e.g., for quantifying the amount of work that can 
be extracted from a given system. The bounds derived in Section 4 imply that H e H has a 
similar operational significance. 

2 Preliminaries 

2.1 Notation and Definitions 

For a finite-dimensional Hilbert space H, let £(H) and V(H) be the linear and positive semi- 
definite operators on U, respectively. On C(H) we employ the Hilbert-Schmidt inner product 
(X,Y) := Tr(X^Y). Quantum states form the set S(U) = {p e V{%) : Tr(p) = 1}, and we 
define the set of sub-normalized states as S<^(H) = {p e V{%) : < Tr(p) ^ 1}. To describe 
multi-partite quantum systems on tensor product spaces we use capital letters and subscripts 
to refer to individual subsystems or marginals. We call a state pxb classical-quantum (CQ) if 
it is of the form pxb = Yi x p{%) \ x ) ( x \ ® Pb with p% e S(Hb), p{x) a probability distribution 
and {\x)} an orthonormal basis of %x- 

A map S : C{U) -► C{W) which, for any W , maps V(H ® %") to V(H' ® H") is called 
a completely positive map (CPM). It is called trace-preserving if TY(£[X|) = Tr(X) for any 
X e V(H). A unital map satisfies £(T) = I, and a map is sub-unital if 5(1) ^ I. The adjoint 
5* of 5 is defined by Tr (5* (Y) X) =Tr(Y£(X)). 
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We employ two distance measures on sub-normalized states: the purified distance P(p, a) [44, 
45, 46] and the trace distance D(p,a) = \ \p— o\\ (where \ \p\\\ = Tr(\/ The purified dis- 
tance is defined in terms of the fidelity F(p, a) = \\y/p^\\i by P(p, a) = y/l — F(p, a) 2 . The 
purified and trace distances obey the following relation [47]: D(p, a) < P(p, a) < ^2D(p, a). 

Finally, the operator inequality A s$ B is taken to mean that A — B is positive semi- 
definite, and when comparing a matrix to a scalar we assume that the scalar is multiplied by 
the identity matrix. Note also that all logarithms taken in the calculations are base 2. 



2.2 Semi-Definite Programs 

Watrous has given an elegant formulation of semidefinite programs especially adapted to the 
present context [48]. Here we follow his notation; see also [49] for a more extensive treatment. 
A semidefinite program over X = C n and 3^ £ C m is specified by a triple (\E f , A, B), for A 
and B Hermitian operators in C(X) and C(y) respectively, and ^ : L(X ) — > C(y) a linear, 
Hermiticity-preserving operation. 

This semidefinite program corresponds to two optimization problems, the so-called "pri- 
mal" and "dual" problems: 

PRIMAL DUAL 



minimize (A,X) maximize (B,Y) 

subj. to V(X) > B subj. to **(Y) ^ A 

X e V{X) Y e V(y) 

With respect to these problems, one can define the primal and dual feasible sets A and B 
respectively: 

A = {X e V{X) : V(X) > B}, (2) 
B= {YeV{y) : #*(Y) < A}. (3) 

The operators X e A and Y e B are then called primal and dual feasible (solutions) respec- 
tively. 

To each of the primal and dual problems, the associated optimal values are defined as: 3 

a= inf (A,X) and /3 = sup (B, Y) . 

XeA YeB 

Solutions to the primal and dual problems are related by the following two duality theorems: 

Theorem 1. (Weak duality), a j3 for every semidefinite program A, B). 

Theorem 2. (Slater-type condition for strong duality). For every semi-definite program (^,A,B) 
as defined above, the following two statements hold: 

1. Strict primal feasibility: If (3 is finite and there exists an operator X > s.t. ^(X) > B, then 
a = (3 and there exists Y e B s.t. (B,Y) = j3. 

2. Strict dual feasibility: If a is finite and there exists an operator Y > s.t. \P*(Y) < A, then 
a = (3 and there exists X e A s.t. (A, X) = a. 



3 If A = or B = 0, we define a = oo or /3 = — go respectively 
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Given strict feasibility, we obtain complementary slackness conditions linking the optimal 
X and Y for primal and dual problems: 

V(X)Y = BY and V*(Y)X = AX. (4) 

Semidefinite programs can be solved efficiently using the ellipsoid method [50]. There 
exists an algorithm that, under certain stability conditions and bounds on the primal feasible 
and dual feasible sets, finds an approximation for the optimal value of the primal problem. 
The running time of the algorithm is bounded by a polynomial in n, m, and the logarithm of 
the desired accuracy (see [48] for more details). 
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3 Relative and Conditional Entropies 

We will now introduce the new family of entropy measures, as well as the smooth entropies, 
and the set of relative entropies that they are based on. 

3.1 Definition of relative entropies 

We define the e-relative entropy D e H (p\ \a) of a normalized state p e S{%) relative to a e V(H) 

2 -dUp\W) : = Iinf{(Q,a) |0 < Q < 1 a (Q, p) > e} . (5) 

This corresponds to minimizing the probability that a strategy Q to distinguish p from a 
produces a wrong guess on input a while maintaining a minimum success probability e to 
correctly identify p. In particular, for e = 1, D e H (p\\a) is equal to Renyi's entropy [51] of order 
0, and Do(p\\a) = — log Tr(p°<r), with p° the projector on the support of p [39]. 

The relative min- and max-entropies D m [ n and D max for p,a e S{%) are defined as fol- 
lows: 

= F(p,a) 2 , (6) 
Anax(Hk) =mf{\eR:2 x a> p}. (7) 

These definitions can easily be extended to subnormalized states p, a e S^(H) by using the 
generalized fidelity F(p,a) = Tr\^/p^\ + ^/(l — Trp)(l — Trcr). We define also the corre- 
sponding smoothed quantities: 

D mm(p\W) = max D min (p\\a), (8) 

peB,{p) 

£>max(p|k)= min Anax(plk), (9) 

with B e (p) = {p e S<^(H)\P(p, p) e} the purified-distance-ball around p so that the opti- 
mization is over all subnormalized states p e-close to p with respect to the purified distance. 



4 Note that this differs slightly from both the definitions used by Wang and Renner [38] and by Tomamichel 
and Hayashi [39]. Similar formulations specific to mutual information and entanglement were previously given 
respectively by Buscemi and Datta [36] and Brandao and Datta [37]. 
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3.2 Definition of the conditional entropies 

We define the new entropy Hj I (A\B) p , in terms of the relative entropy we have already 
introduced, as follows: 

H e H (A\B) p := -D e H (p AB \\I A ®p B ) (10) 

In the smooth entropy framework, the min- and max- entropies are given by: [46, 52, 53] 

H^(A\B) pW := -D e max (p AB \\I A ®a B ) , (11) 
#max (A\B) pW := -D^pabWU ® a B ) , (12) 

max sup -D max (p AB \\I A ® cfb) , (13) 

peBe(p) erg 

min sup -D min (p AB \\I A ® a B ) . (14) 

The non-smoothed versions H m j n {A\B) and H max (A\B) are given by taking e = 0. 

For the special case when e — > 0, i/j^^i?) converges to H m j n (A\B) p \ p since for the op- 
timal solutions to the semi-definite program as defined below X — >• 0. In the case where 
one is also not conditioning on any B-system (i.e. take B to be a trivial system, or take 
PAB = PA® Pb), then Hjj reduces to the min-entropy: 

]\mH e H {A) p = H min (A) p = -log||p A ||oo- (15) 

Note also that H e H is monotonically increasing in e: to see this, find that the dual optimal 
{/i, X} for 2 h h (see below) is also feasible for 2 h h with e' ^ e. 

3.3 Elementary Properties 

As we are going to show in this section, the quantities D e H and H e H we introduced satisfy 
many desirable properties one would expect from an entropy measure. 

3.3.1 Properties of D e H 

D e H can be expressed in terms of a semi-definite program, meaning it can be efficiently ap- 
proximated. Due to strong duality we obtain two equivalent expressions with optimal so- 
lutions linked by complementary slackness conditions [49]. The semi-definite program for 

2 -D* h { P \\<t) reads . 

PRIMAL DUAL 

minimize ^TrfQo-] 
subj. to Q< I 

TrfQp]^ e 

Q>0 



H^(A\B) p := 
H* max (A\B) p := 



■ . Tr[X] 

maximize p ^— 1 

subj. to pp ^ d+X 

X ^ 
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This yields the following complementary slackness conditions for primal and dual optimal 
solutions {Q} and {p,X}\ 

{pp - X)Q = oQ (16) 
Tr[Qp] = e (17) 
QX = X (18) 

from which we can infer that [Q, X] = 0, as well as the fact that the positive part of (pp — a) 
is in the eigenspace of Q with eigenvalue 1. 

Further properties include: 

Proposition 1 (Positivity). For any p,a e S(H), 

D%(p\\a)>0, (19) 

with equality if p = a. 

Proof. Positivity follows immediately from the definition of D e H by choosing Q = el. Equality 
is achieved if p = o because - vciwi^^Qp^^ Tr(Qp) = 1. □ 

Note that D^j (p\\a) = does not generally imply p = a: for example, consider the case 
where e = 1 and where p and a have same support. 

Proposition 2 (Data Processing Inequality (DPI)). For any completely positive, trace non-increasing 
map £, 

D* H (p\\a) > D* H (£(p)\\£(a)). (20) 
Proof. For a proof of this DPI, see [38]. □ 
Proposition 3 (Asymptotic Equipartition Property). Let 

D(p\\a) = Tr[p(logp-\oga)] 
be the relative entropy between p and a [41 ]. Then, for any < e < 1, 

Urn ±.D' H (p® n \\o® n ) = D(p\\v). (21) 
Proof. From Stein's lemma [3, 54] it immediately follows that 

lim - D e H (p® n \\a® n ) = lim -- log min -Tr{a m Q}, (22) 

n— »oo n n— >co n e 

= D(p\\a)- lim -(log-) (23) 

n^co n e 

= D(p\\a), (24) 

where the minimum is taken over < Q < 1 such that TrQp > e. □ 
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3.3.2 Properties of H e H 

Proposition 4 (Bounds). For p AB an arbitrary normalized quantum state and pxb a classical- 
quantum state, 

-log\A\ *ZH%(A\B) p *Z\og\A\, (25) 
0^H e H (X\B) p ^log\X\. (26) 

For classical-quantum states, H e H {X\B) = ifX is completely determined by B (so that Tr(p B pg) = 
for any x' j= x), and the entropy is maximal if X is completely mixed and independent of B (i.e. 
Pxb = fxylx ® Pb)- 

Proof. Start with the upper bound on H e H , and choose el as a feasible Q: 

2 H h (A\B) p= min ±Tr[Q AB I A ®p B ] (27) 

Tr[Q ABPAB ]^e € 

< -Ti[el AB l A ® PB \ (28) 
= \A\. (29) 

For the lower bound we use the inequality \A\I A (g> p B > p A B, which holds for arbitrary 
quantum states p AB . To establish this inequality, define the superoperator £ as £{p) = 
2^ fe (^V fc )p([PV fe )t. Here, d = dim(7{) while U and V are unitary operators defined 
by f |i) = \j + 1) and V \ k) = oj k \ k), for an orthonormal basis set {|j)}^~Q, u = 
where arithmetic inside the ket is taken modulo d. (The operators U and V are often called 
the discrete Weyl-Heisenberg operators, as they generate a discrete projective representation 
of the Heisenberg algebra.) Then it is easy to work out that £® I [p AB ] = t a -Ja < S>Pb, which by 
the form of £ implies the sought-after inequality. Then, for the optimal Q AB in H e H (A\B) p , 

2 Hh{A\B) P = l T r[Q AB I A ® PB ] (30) 

> -^^[QabPab] (31) 

> ^ (32) 

Classical-quantum states pxb obey lx ® Pb ^ Pxb^ as Xlrr'^VPs ^ PzPb f° r all x - This 
implies Hj I (X\B) p ^ by the same argument. 

That the extremal cases are reached for the described cases follows immediately from the 
respective definitions of pxb and H^. □ 

Similarly to D € H , Hjj also satisfies a data processing inequality 5 . 

Proposition 5 (Data Processing Inequality). For any p AB e S{Hab)> let £ : A — > A' be a 
sub-unital TP-CPM, and F : B — > £?' foe a TP-CPM. Then, for t A i B > = £ o F(p AB ), 

Hjj(A\B) p HUA'\B') T (33) 



5 This proof is adapted from the DPI proof for a differently defined H € in Tomamichel and Hayashi [39] 
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Proof. Let {p, X AB } be dual-optimal for H e H (A\B) p . Starting from pp AB ^ ^a ® Pb + X AB 
and applying £ o J 7 to both sides of the inequality yields: 

Ptab < S(I A ) ®t b ,+£o F(X AB ) sS I A > ®t b , + £o T(X AB ). (34) 

Hence, {p,£oT(X AB )} is dual feasible for H £ H (A'\B') T and 2 hc h( a '\ b ')- j> p-Tv(£oF{X AB )/t) = 
2 ^| f (A|B) P _ ' n 

Proposition 6 (Asymptotic Equipartition Property). For any < e < 1, it holds that 

lim - H e H (A n \B n ) p ^ n = H(A\B) p , (35) 
where H{A\B) refers to the conditional von Neumann entropy. 

Proof. Using the asyptotic property of D e H derived from Stein's lemma above, we can show 
foxH e H (A\B): 

lim -(H e H (A^\B^ n ) p ) = lim \-DUp® n \\{U® PB ) m )) (36) 

= -D{ PAB \\l A ® PB ) (37) 

= -Trp AB (\og pab ~ log I a <8> Pb) (38) 

= H(AB) -Tr(p B log PB ) (39) 

= H(AB) - H(B) (40) 

= H{A\B) . (41) 

□ 



4 Relation to (relative) min- and max-entropies 

The following propositions relate the new quantities to smooth entropies. This guarantees 
an operational significance for D € H and H e H (see Section 1.1). 

Proposition 7. Let p e S(Hab), & e V{%ab) an d < e < 1. Then, 

D^(p\\a) < D' H (p\\v) < D max ( P \\a) (42) 
H £( A \ B ) P > Hh(A\B) p > H min (A\B) plp (43) 

Proof. The upper bound for D e H follows immediately from the fact that p = 2~ Dmax MI°") and 
X = are feasible for 2~ d h( p ^ in the dual formulation. For the lower bound, let p and X 
be dual-optimal for 2~ d Up\W) . Now define G := a^ 2 (a + X)' 1 / 2 and let p := GpGl It thus 
follows that pp s; a, and hence 2-°™*^ Ss /x. Since Tr[X] Ss 0, it holds that /x 2- d h(pIH 7 
which implies that 2~ d h^\ I CT ) sc 2- Dm£ "^H <T ). 

It is now left to prove that the purified distance between p and p does not exceed \[2e. 

For this we employ Lemma 3, from which we obtain the upper bound yj ^Tr[X]. Together 

with < ep — Tr[X], this implies that P(p, p) < \[2e, which concludes the proof. 



10 



These bounds can now be rewritten to relate H\r to H* in . We have 

a mm 

H £( A \ B )p > -D^(p AB \\I A ® PB ) > -DMpabWIa® Pb) = Hjj(A\B) p . (44) 
In the other direction we find: 

Hjj(A\B) p = -D e H ( PAB \\I A ®p B ) > -D max (p AB \\I A ®p B ) := H min (A\B) p{p . (45) 

□ 



Proposition 8. Let p e S{H) and a e V(W) have intersecting support, and < e < 1. Then, 

D min (p\\a) - log 1 < D}f e (p\\v) < D^(p\\a) - log (46) 

H max (A\B) p + log i ^ i^(A|B) p (47) 

Proof. We begin with the lower bound for D l H t . Let /it, Q, and X be optimal for the primal and 
dual programs for 2~ d h Ml* 7 ) and define Q 1 := 1 — Q. Complementary slackness implies 
Tr[Q 1 p] =e,gi = I and Q(/xp - a - X) = 0. Thus, 

Q(/xp - a - X) = Q{pp -a)-X, (48) 

meaning Q(pp — a) is hermitian and positive semidefinite. This implies that Q 1 (pp — a) is 
also hermitian and Q L (pp — a) < 0. Since Q + Q L = I, this gives a decomposition of (pp — a) 
into positive and negative parts, and thus \pp — a\ = Q(pp — a) — Q L (pp — a). We can now 
proceed: 

= F(p,a) (49) 
= -L F (pp,a) (50) 

> ^Tr[ W + o- - \pp - a\] (51) 
= ^-=Tr[pp+a-Q(pp-a) + Q 1 (pp-a)] (52) 

= -J=TY[Qct + pQ ± p] (53) 

> V^TrfQ^p] (54) 
= \//" e (55) 

> ey/»-Tr[X]/(l-e) (56) 
= e2 - 1 2 D H e (p\\°). (57) 

We have used that HVaVBHi > Tr[A + B - \A - B\]/2 for positive semidefinite A, B (a 
variation of the trace distance bound on the fidelity; see Lemma A.2.6 of [8]). 
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Now we prove the upper bound. Let Q be primal-optimal for 2~ d h (pII' t ) / define p : = 

Q2 pQ2, and let pab be an arbitrary purification of pa- Conjugating both sides of pab ^ I by 

Qz, we obtain pab < Qa <S> Ib- 

The fidelity between p and p can be written also in terms of an SDP for F(pa, a a) 2 (with 
Pab an arbitrary purification of pa)' 

PRIMAL DUAL 

maximize ^[pabXab] minimize Tr[Za] 

subj. to Tr b [Xab] = cr A subj. to Pab < Z A ®I B 

We see that Q is a feasible Za in the SDP for F(p, a) 2 . Hence, 

2-A«ta(p| W) = F(p,a) 2 (58) 
s$ Tr[Qa] (59) 
= (l-e)2- D H~ C) ^ a \ (60) 

and so D min {p\\a) > D%- € \p\\&) + log jL.. 

From complementary slackness we get that Tr[Qp] = 1 — e. Using Lemma 2 we obtain 
P(p, p) < ^/l — TrfQ/?] 2 ^ V2e, and the first part of the proposition follows. 

Rewriting this for H max and H^~^ yields: 

H max (A\B) p > H max (A\B) plp (61) 
= - Anin (pab I \Ia <S> Pb) (62) 

>- D}f e (p ab\\U ® Pb) - log ^ (63) 



/^(A^-logi (64) 



□ 



5 Decomposition of Hypothesis Tests & Entropic Chain Rules 

In this section we prove a bound on hypothesis testing between p and a in terms of hypoth- 
esis tests between p and some other state £ and £ and a. This bound yields a chain rule for 
the hypothesis testing entropy. 

We first require the following Lemma: 

Lemma 1. Let p, pe S(H) be such that \\p — p\\\ < & for some 5^0. Then, for any a e V(H), 

D£\p\\a)+ log j^^DUpWa). (65) 
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Proof. Let Q be primal-optimal for D^ s (p\ \a). It follows that 

Tr[Qp] = Tr[Q(p - p)] + Tr[Qp] (66) 
^ -5 + e + 5 (67) 
= e. (68) 

Hence, Q is primal-feasible for D e H (p\\a), yielding a bound of 

2~ d h{p\W) sc lTr[Qa] (69) 

= £+i 2 -D* +4 Wk) j (70) 
e 

which proves the lemma. □ 

Now we can state the main result, which deals with the relative entropy of arbitrary states 
to those that are invariant under a group action. For a group G and unitary representation U g , 
let £c(p) = p| 2 9 gG UgpUg, which is a quantum operation. (For simplicity of presentation 
we assume the group is finite, but the argument applies to continuous groups as well.) 

Proposition 9. For any p,a e S(7i) and group G such that a = £g{&), let £ = £g{p)- Then, for 
e, e' > 0, 

D^(p\\a) < DUPWO + D^\\a) + log 1±^K. (71) 

Proof. Let p\ and X\ be optimal in the dual program of D e H (p\\£) and, similarly, p 2 and X 2 be 
optimal in D^(£||<7). Thus, p\p ^ £ + X\ and // 2 £ < cr + X 2 . Observe that X 2 can be chosen 
G-invariant without loss of generality, since p 2 £, ^ a + £g(X 2 ) and TrLY 2 ] = Ti[£g(X 2 )]. 
Chaining the inequalities gives 

P1P2P < cr + X 2 + P2X1. (72) 

Next, define T = (a + X 2 )~ 2 and conjugate both sides of the above by T. This gives 

pip 2 T P T ] + p 2 TXxT\ (73) 

Thus, the pair p x p 2 , p 2 TX x T^ is feasible for D e H (TpT^\\a). Since T is a contraction (TT* «S I), 
we can proceed as follows: 

j-iW-IM j, „ 1M2 _ ^P^l (74) 

e 

^ 2 TrXi 

5* (75) 

e 

= ^ 2 2- D ^H« (76) 

j> 2 - d h(^'>2- d h(p^I (77) 

Now we show that P(p,TpT^) < V26 7 , in order to invoke Lemma 1. Let the isometry 
V : Ha — »• Ha ® ^i? be a Stinespring dilation of £c so that = Va—>arPa Va—>ar = 
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jG| 2g,g'eG U g pU ] g , ®\g){g'\. The state £ AR is an extension of £ A since £ A = Ti R \^ AR ]. Clearly, 

T A £ AR T A is an extension of T£T^ . We now apply Lemma 3 to the inequality £ ^ cr//i2 +X 2 /p2 
to find 



; Tr[X 2 ] / _ Tt[X 2 \ 

P2 V A»2 . 

'2?. (79) 



P(ew, r^rt) ^ Ji^j 2 _ itla^j (78) 

V A*2 V ^2 



This entails that 

P(p, TpT^) = P{V PA V\ VTpT^V^) (80) 

= p(yp A W,ry / 9y t r t ) (8i) 

= P(^fl,T A ^rl) (82) 
s$ V27, (83) 

where we have used the fact that T A commutes with Var- This then implies that \\p — 
TpT^ 1 1 1 < V86 7 • Lemma 1 and (77) then yields the proposition: 

D^{p\\a) + log — L= ^ ^(^TpTt^Ha) (84) 
e + voe' 

^(pII6 + (85) 

□ 

Corollary 1 (Chain rule for H £ H ). Let pabc e be an arbitrary normalized state, and e, e' > 0. 
Then, 



iTj+ V8£ (AB|C7) P 5* H\A\BC) P + (B\C) P - log — j . (86) 

Proof. Let G be the Weyl-Heisenberg group representation (as in the proof of Prop 4) acting 
on A, for which Sg(pabc) = ^a® Pbc, where tta = I/dim(HA)- Applied to the hypothesis 
test between pabc and ttab ® Pc, we find 



(pabc\Wab ® Pc) 

s; D e H (pABc\\nA® Pbc) + D € h(tta<8> PbcWab <S> Pc) + log — J— ( 87 ) 

, g _|_ -y/gg' 

sc £>| f (/9Asc|k J 4®Psc) + -Dif(PBc|KB®Pc) + log • (88) 

As H]j{A\B) a = logoU — Djj (ctab || ^A (x) <tb), this is equivalent to the desired result. □ 
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A Useful Lemmas 

Lemma 2 (Lemma 7, Berta et al. [55]). For any p e S<^(Tl), and for any nonnegative operator 
n I, 

p( P , n P n) ^ -^VCiVp) 2 - (Tr(nv)) 2 (89) 



Proof. Since 1 1 y'py'npnl 1 1 = Tr^/ ( v /pll v /p) ( ^fpU^fp) = Tr (lip), we can write the generalized 
fidelity as 

F(p, UpU) = Tr(n P ) + V(l-T r/ 9)(1-Tr(n 2 p)). (90) 

For simplicity, introduce the following abbreviations: r = Tip, s = Tr(Up) and t = Tr(U 2 p). 
As p ^ 1 and IT ^ 1 trivially 0<i<s<r<l. In terms of these variables, we now have that 

1 - F(p, UpU) 2 = r + t-rt-s 2 - 2s^(l - r)(l - t). (91) 



Since P(p, UpU) = y/l - F(p, UpU) 2 , it is sufficient to show that r(l-F(p, UpU) 2 ) -r 2 + t 2 sc 
0. This we can establish: 

r(l - F(p, UpU) 2 ) - r 2 + t 2 = r(r + t - rt - s 2 - 2s v / (l - r)(l - t)) - r 2 + t 2 (92) 

^r(r + t-rt-s 2 - 2s(l - r)) - r 2 + t 2 (93) 

= rt - r 2 t + t 2 - 2rs + 2r 2 s - rs 2 (94) 

«S rt - r 2 t + t 2 - 2rs + 2r 2 s - rt 2 (95) 

= {l-r){t 2 + rt-2rs) (96) 

^ (1 - r)(s 2 + rs - 2rs) (97) 

= (l-r)s(s-r) (98) 

s= (99) 

and the lemma follows. □ 

Lemma 3 (Lemma 15, Tomamichel et al. [56]; Lemma 6.1 [57]). Let p e S(U), a e Vifhi), 
p ^ a + A, and G := era (cr + A)~ 2, where the inverse is taken on the support of a. Furthermore, let 
|V>> £ S(7i (x) %) he a purification of p. Then, 

P(V',(G(x)I)V'(G' t (x)I)) s= VTrA(2-TrA). (100) 

Proof. Let e S(H (g> %) be a purification of p. Then, (G ® I) is a purification of GpG\ 
and with the help of Uhlmann's theorem we can bound the fidelity: 

Fty, (G ® I)p(G T ® I)) = | (V'l G ® I | (101) 

^{Tr(Gp)} =Tr(Gp), (102) 

with G := ±(G + G*). Since G is a contraction 6 , ||G|| sS 1. Also, ||G|| «S 1 by the triangle 



6 to see this, conjugate both sides of a s£ a + A by (a + A) 1/2 to get G f G sS 1. 
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inequality and thus Tr(Gp) < 1. Furthermore, 



l-Tr(Gp) = TY((I-G>) 



(103) 
(104) 

(105) 
(106) 



s$ Tr(a + A) - Tv(G(a + A)) 
= Tr((j + A) - Tr((cj + A) ^(a) s) 
< Tr(A), 



where we have used p < cr + A and + A > y^. Then we find 



P(V, [G <g> I)V>(G f <g> I)) = ^l-F(V>,(G(x)I)V>(Gt®I)) 2 



(107) 
(108) 
(109) 



< Vl"(l-Tr(A)2) 



= VTrA(2-TrA). 



□ 
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